Computer representation of a data tree structure and the associated encoding/decoding methods

ABSTRACT

A memory storing a computerized data array in the form of a table of values stored in the memory as a directed tree representing a set of data. Each data entry in the set is associated with a particular node of the tree, the values representing node ranks of the tree. The node ranks are ordered according to a first total order relation, the values being stored at addresses in the memory representing the node ranks and being ordered according to a second total order relation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a U.S. National Phase and claims the benefit of the filing dateof PCT/FR03/000576, filed Feb. 21, 2003, and also claims the benefit ofpriority under 35 U.S.C. §119 of French Application No. 02/02664, filedFeb. 27, 2002, the entire disclosures of which are hereby hereinincorporated by reference.

The present invention relates to a computerized data array of a directedtree showing the organization of a set of data, in particular adictionary. The present invention also concerns a method of encodingsaid directed tree in said computerized data array. Moreover saidinvention also concerns a method of encoding a data entry belonging tosaid data set into an index of said data array. Lastly the presentinvention relates to a decoding method allowing retrieving saidcorresponding data on the basis of said computerized data array.

A number of terms used herebelow will now be defined before discussingthe state of the art:

A directed graph (hereafter simplified to just “graph”) denotes a pairG=(S, A), where S is a set of tops (hereafter also termed “nodes”), andA is a sub-set of S×S which is called “set of arcs”.

A path in the graph is an ordered sequence (s_(o), s₁, . . . s_(n)) oftops such that (s_(l−1,)s_(l)) is an arc for l=1 . . . n. Whens_(n)=s_(o) where n≧1, the path is called a “circuit” or a “cycle”. Agraph is called “convex” if two arbitrary nodes of this graph are linkedby one path.

A tree is defined as a non-circuit related graph. It may be shown thattwo arbitrary tops of a tree are linked by a unique path. A treecomprises a particular top R such that any top s different from R islinked to latter by a path. This particular top is called the tree's“root”.

For a given top S, the descendant of S is any top s_(d) of the tree suchthat there is a path. between s and s_(d). Conversely, for a given tops, the ancestor of s is any top S_(a) of the tree such that there is apath between S_(a) and s. “Offspring of a top” of a top S is adescendant s_(f) of s such that (S, s_(f))εA. For any top s of the tree,the subtree of s is the tree of root s comprising all the descendants ofs.

Lastly a “leaf” is any tree top lacking a descendant.

Many data processing procedures resort to a tree-shaped data array, inparticular procedures to classify, compress or store information.

As regards the application under consideration, the data may becharacter chains, sequences of phonemes, waveshape,luminance/chrominance patterns etc.

Without prejudice to generality, herebelow the data shall be constitutedby chains of elementary entities or characters (for instance letters,ideograms, numerals, alphanumeric signs). The set of these possiblecharacters constitutes an alphabet. It is assumed herein that saidalphabet comprises a total order relation called “alphabetic order”.

As regards many applications such as “search engines”, “search adictionary”, “search a phone book”: etc., a very large volume of datamust be stored and be accessible, entailing severe constraints in actualoperating conditions, in particular as regards on-line access.

All data must be quickly accessible without requiring largecomputational power. Moreover, in order to reduce accessing time, largedata volumes must remain in the central memory. To preclude the size ofthis memory from growing excessively large, frequently the data must becompressed beforehand. Advantageously the data should be accessiblewithout requiring being decompressed, whereby accessing time would bedegraded further.

As regards the above applications, the data may be processedasymmetrically: the data compression stage may include comparativelylengthy and complex processing whereas their accessing retrieval stagemust be simple and fast. Accordingly the data may be stored in memory infrozen and compressed form, their updating taking place off-line beforethey will be moved back online.

One data organizing geometry is especially well suited for compression,namely that of the above defined tree. This geometry is present inparticular in dictionaries or phone books. In the common sense of theword, a dictionary is a data file (also called inputs), each dataconsisting of a chain of alphabetic characters, and said chains beingorganized in a tree geometry.

In practice and in a computerized data array, all dictionary data isassociated with an index. The search for a chain of characters (or word)in the dictionary amounts to identifying the index of the correspondingword. Accordingly a text may be represented by a sequence of indiceswhich is better fitted to data processing than the initialrepresentation.

Various kinds of representations or arrays have been proposed in theprior art, in particular:

-   -   in the form of a dichotomy Table using a Ziv-Lempel compression,    -   in the form of a hash Table,    -   in the form of a lexical tree.

In case of perfect access, those different sorts of representation offerequivalent performances. “Perfect access” is an accessing mode to searchthe precise chain of characters in the dictionary that corresponds tothe word to be analyzed, neglecting errors or modifications.

A data array in the form of a lexical tree assumes analysis of, i.e.parsing, the chains of characters. FIG. 1 shows an illustrative lexicaltree for the following Δ dictionary,

Δ={abolish, abolition, appeal, attorney, bar, barrister, bench, case,court, crime}.

Be it borne in mind that in the lexical tree, the arcs are associated tothe characters of the dictionary's words. More specifically, a tag isassociated with each tree arc, each tag being fitted with a characteracting as a tag. The lexical tree is the junction of all the paths ofwhich the skeleton corresponds to one word in the dictionary. The“skeleton” of a path is the chain of characters of the arc tagsconstituting this path. A dictionary word also is called dictionary“input”.

Be it noted furthermore that the leaves of the lexical tree arerepresented by circles, whereas the other tops were shown by disks. Thetree root is denoted by R.

The tree is assumed indexed, that is, one index is associated with eachtop. One elementary operation on a lexical tree is to search for thecorresponding dictionary input index on the basis of a given word. Thisoperation entails traversing the tree along the arcs tagged by theconsecutive characters composing the word.

More specifically, the search algorithm implements an AnalyzeWordfunction which returns the index as the value [associated-index (s)] ofthe dictionary input if latter is present, or, by default, a code ofnon-identification {unknown-word-index}. Hereafter said value isexpressed in pseudo-code C:

Function IndexNumber AnalyzeWord (chain AnalyzeWord, top Root) Begin tops = Root, For each Character of AnalyzeWord:, Carry out If end-of-word(AnalyzeWord) and s is a leaf Then return associated-index (s); IfCharacter corresponds to a tag of an arc issuing from s Then s =corresponding-descendant(s); Otherwise return Unknown-Word-Index; End.

Traversing by the instructions s=corresponding-descendant(s) assumesthat a computerized data array of the lexical tree is available.

In general, a computerized data array of a tree will be required toeasily traverse the tree, to use it and to modify it.

As regards a first known computerized data array, a tree is representedby an adjacency Table M=(m_(ij)), where l=0, . . . , n; j=0, . . . , n,said Table being stored in memory and where m_(ij)=1 when (s_(l), s_(j))εA.

As regards a more recent computerized data array, a tree is shown as asequence of pointers. In a first known variation illustrated in FIG. 2A,each node is represented by a value (or index) and a Table of pointerspointing to its offspring nodes. The size of the Table corresponds tothe maximum number (k) of offspring that a tree node may have (in whichcase it is called a k-tree). FIG. 2A illustrates a 3-tree in thisvariation.

Encoding the node offspring by a Table of pointers entails the drawbackof demanding much memory space already when the tree contains only asmall number of nodes exhibiting many offspring and many other nodeshaving few offspring. In a second known array variant, this difficultyis remedied by using, for a given node, a pointer pointing toward one ofits offspring nodes which is called the elder offspring and a pointerfrom the elder offspring toward a chained list of its siblings. FIG. 2Billustrates such a second variant representation for a 5-tree.

The pointers allow rapidly modifying the tree geometry but on the otherhand they require a relatively substantially large memory. Moreoverdetecting descendance between nodes is not immediate. Such detectionpresumes ascertaining the path linking the two nodes, whereby, withinthe scope of pointer representation, considerably computing resourcesare required. Computation may be substantially reduced by storing thetransitive tree closing, in which case a node will point toward each ofits descendants. However this latter option occupies comparatively muchmemory space.

The basic objective of the present invention is to create a computerizeddata array of a tree that shall demand only little memory space andallows traversing said space easily and modifying it in simple manner.

This objective is attained by the tree computerized data array of theorganization of a set of data, in particular of a data dictionary, eachdata being associated to a particular node of said tree, saidrepresentation comprising a Table of values which are stored in amemory, said values representing the ranks of the nodes of said treewhich are ordered according to a first total order relation, theaddresses at which said values are stored representing the ranks of thenodes of said tree that are ordered according to a second total orderrelation.

Advantageously the first total order relation is a combination of adescendance order relation ordering a node relative to its descendantsand a primogeniture order relation ordering the offspring nodes of agiven node.

In a first implementing mode of the present invention, a first tree nodeis lower than a second tree node according to first said total orderrelation if the second node is a descendant of the first node or if thecommon ancestor of the first and second nodes has a first offspring fromwhich the first node descends or is merged with the latter and a secondoffspring from which the second node descends or is merged with thelatter, said first offspring is lower than the second offspringaccording to the primogeniture order relation.

In a second implementing mode of the present invention, a first treenode is higher than a second tree node according to the first totalorder relation if the second node is a descendant of the first node orif the common ancestor of the first and second nodes has a firstoffspring from which s the first node descends or is merged with thelatter and a second offspring from which s the second node descends oris merged with the latter, said first offspring is lower than saidsecond offspring according to the primogeniture order relation.

Advantageously the second total order relation is a combination of theinverse ordering relation of said descendance order relation and saidprimogeniture order relation.

In a first variant of the present invention, a first tree node shall belower than a second tree node according to said second total orderrelation if the first node is a descendant of the second node or if thecommon ancestor of the first and second nodes has a first offspring fromwhich the first node descends or is merged with latter and a secondoffspring from which the second node descends or is merged with latter,then said first offspring is lower than the second offspring accordingto said primogeniture order relation.

In a second variant of the present invention, a first tree node ishigher than a second tree node according to said second total orderrelation if the first node is a descendant of the second node or if thecommon ancestor of the first and second nodes has a first offspring fromwhich the first node descends or is merged with the latter and a secondoffspring from which the second node descends or is merged with thelatter, said first offspring is lower than the second offspringaccording to said primogeniture order relation.

If the data are character sequences of an alphabet comprising alphabeticordering, each arc of said tree being associated with a character of atleast one data entry, the primogeniture order relation between twooffspring from the same node may be determined by the alphabetic orderrelation between the characters associated with the respective arcsbetween said node and its two offspring.

The present invention also relates to a method of encoding a directedtree representing the organization of a set of data, in particular adictionary, each data entry of said set being associated to a particularnode of said tree, where each node of said tree is assigned a first anda second index, the first index representing the node rank according toa first total order relation ordering the nodes of said tree, the secondindex representing the node rank according to a second total orderrelation, the first total order relation being a combination of adescendance order relation ordering a node relative to its descendantsand of primogeniture order relation ordering the offspring nodes fromthe same node, the second total order relation being a combination ofthe inverse ordering relation of said descendance order relation and ofsaid primogeniture order relation.

Advantageously the encoding method comprises recursive calculationstages revealing, for any arbitrary tree node, the size of the subtreeissuing from said node.

As regards a first and a second offspring from the same node, calledparent node, where said first and second offspring are adjacent in alist of offspring ordered according to said primogeniture orderrelation, the calculation stage will determine the first index of thesecond offspring on the basis of the first index of the first offspringand the size of the subtree issuing from the first offspring, and thesecond index of the second offspring on the basis of the second index ofthe first offspring and the size of the subtree issuing from the secondoffspring.

Said calculation stage determines the first index of the offspring thatwas ranked being first in said list based on the first index of saidparent node and the second index of said parent node based on the secondindex of the offspring ranked last in said list.

Also said calculation stage determines the size of the subtree issuingfrom said parent node based on the sum of the sizes of the subtreesissuing from its offspring.

Advantageously said encoding method operates on a first array of saidtree using pointers, in a manner that, for a given node, a first kind ofpointer provides an offspring node according to the descendance orderrelation and a second kind of pointer provides the list of its otheroffspring.

The present invention also is defined by an encoding method of an inputdata entry belonging to a set of data which are organized according to adirected tree geometry, in particular to a data dictionary, said dataconsisting of sequences of characters of an alphabetically orderedalphabet, each data entry being associated with a given node of saidtree, and to each arc being associated with a character, wherein saidtree is represented by the above computerized data array, the tree beingcrossed from node to node along a path starting at the root, said inputdata entry being analyzed character by character, the node following aninstantaneous node of said path being selected from among the offspringof said instantaneous node, the selection being implemented by asequence of comparison stages each of which compares the instantaneouscharacter of said input data entry with the character associated withthe arc linking the instantaneous node to one of its offspring, thetraversal being interrupted only after said input data entry has beenfully analyzed, said method attaining the encoded value of said inputdata entry in the form of an index which relates the address of theTable of said computerized data array representing the last node of saidpath.

Moreover the present invention is defined by a method of decoding anindex representing a data entry which belongs to a set of data arrangedin a directed tree geometry, in particular a data dictionary, said dataconsisting of sequences of characters of an alphabetically orderedalphabet, each data entry being associated with a character, where saidtree is represented by means of the above cited computerized data array,the tree being traversed along a path starting at the root, the nodefollowing an instantaneous node of said path being selected from amongthe offspring of the latter node, said selection being implemented by asequence of comparison stages, each of which compares said index to anindex that represents one of said offspring in said computerized dataarray, said method providing, as the decoded data, the chain ofcharacters associated with the arcs constituting said path.

The above cited features of the present invention as well as others areelucidated in the description below of certain modes of implementationand relates to the appended drawings.

FIG. 1 shows an illustrative lexical tree,

FIG. 2A shows a first computerized data array of a tree using pointers,

FIG. 2B shows a second computerized data array of a tree using pointers,

FIG. 3A illustrates a tree encoding method of a first embodiment of theinvention,

FIG. 3B shows a first variation of the computerized data array of thetree of the tree of FIG. 3A,

FIG. 3C shows a second variation of the computerized data array of thetree of FIG. 3A,

FIG. 4 illustrates a tree section before indexing by prefix rank andpostfix rank.

The basic concept of the present invention is to create a novelcomputerized data array of a tree on the basis of a total order relationtranslating the dependency relations between the nodes.

The node interdependency relation entails a partial ordering relationaffecting the group of tree nodes. Illustratively, assuming that for twotree nodes s₁ and s₂ there is s₁>s₂ provided that s₂ be a descendant ofs₁, then an ordering relation does exist. However this ordering is onlypartial because not all nodes of this tree may be compared in thismanner (illustratively the offspring from the same node).

A total order relation may be constructed regarding the nodes of a treeprovided it is known how to order the offspring from one node. Theordering in which to rank the offspring of the same node shall be calledconventionally the primogeniture order. As regards a lexical tree ofwhich the arc tags contain alphabetical characters, it may be agreedthat two offspring s₁ and s₂ of one node S shall satisfy the relations₁>s₂ if the character of the tag associated with the arc (S, s₁)precedes that of the tag associated with the arc (S, s₂). Otherwise thealphabetic order of the reference label tags entails a primogenitureorder on that offspring nodes from the same node.

The combination of the partial descendance order relation

$\left( {{{hereafter}\mspace{14mu}{denoted}\mspace{20mu}{as}}\mspace{14mu}\underset{D}{>}\;{or}\;\underset{P}{>}\mspace{14mu}{or}\mspace{14mu}\underset{D}{<}\underset{P}{<}} \right)$with the primogeniture order relation (hereafter) allows attaining atotal order relation for all the groups of nodes. This combination maybe attained in several ways:

$\underset{pref}{<}$

-   -   prefix ordering relation (conventionally denoted by):

$a\underset{ferp}{<}{b\mspace{14mu}{if}\mspace{14mu} a}\underset{D}{<}{b\mspace{14mu}{or}\mspace{14mu} a^{\prime}}\underset{P}{<}b^{\prime}$where a′ and b′ are the offspring of the common ancestor of a and b suchthat a is a descendant of or merges with a′ and b is a descendant of ormerges with b′.

In other words the node a is lower than the node b in the sense ofprefix ordering if b is a descendant of a or a′ is an older brother ofb′.

$\underset{ferp}{<}$

-   -   inverse prefix ordering relation (conventionally denoted)

$b\underset{perf}{<}{b\mspace{14mu}{if}\mspace{14mu} a}\underset{D}{<}{b\mspace{14mu}{or}\mspace{14mu} a^{\prime}}\underset{P}{<}b^{\prime}$

$\underset{post}{<}$

-   -   inverse postfix ordering relation (conventionally denoted by):

$a\underset{post}{<}{b\mspace{14mu}{if}\mspace{14mu} a}\underset{D}{<}{b\mspace{14mu}{or}\mspace{14mu} a^{\prime}}\underset{P}{<}b^{\prime}$

In other words the node a is lower than the node b in the sense ofpostfix ordering is a descendant of b or a′ is an older brother of b′.

$\underset{tsop}{<}$

-   -   inverse postfix ordering relation (conventionally denoted):

$b\underset{tsop}{<}{a\mspace{14mu}{if}\mspace{14mu} a}\underset{D}{<}{b\mspace{14mu}{or}\mspace{14mu} a^{\prime}}\underset{P}{<}b^{\prime}$

Because two arbitrary tree nodes are either descended from each other ordescended from a common ancestor, the above defined ordering relationsare total ordering relations.

${\underset{pref}{<}{\left( {{or}\underset{ferp}{<}} \right)\mspace{14mu}{and}}\mspace{11mu}\underset{post}{<}\left( {{or}\mspace{14mu}\underset{tsop}{<}} \right)}\mspace{11mu}$The order relations therefore allow entirely ordering the totality S ofthe tree's nodes. In other words, a “ranking” function of S in [0,n] maybe associated with each of the order relations, for instance:PrefixRank: S

[0,n]

$s_{1}\underset{pref}{<}s_{2}$such that if and only if PrefixRank(s₁)<PrefixRank(s₂)PostfixRank: S

[0,n]

$s\underset{post}{<}s_{2}$

${\underset{ferp}{<}\mspace{14mu}{and}\mspace{14mu}\underset{tsop}{<}}:$such that if and only if PostfixRank(s₁)<PostfixRank(s₂).

PrefixRank and PostfixRank are ordered group morphisms. Rankingfunctions InversePrefixRank and InversePostfixRank may be defined in thesame manner using the ordering relationsInversePostfixRank: S

[0,n]

$s_{1}\underset{ferp}{<}s_{2}$such that if and only if InversePrefixRank(s₁)<InversePostfixRank(s₂)InversePostfixRank: S

[0,n]

$s_{1}\underset{tsop}{<}s_{2}$such that if and only if InversePostfixRank(s₁)<InversePostfixRank(s₂).

In a first mode of implementation, a bijection T of [0,n] into [0,n]defined as follows is used to construct the computerized data array:T: [0,n]

[0,n]T=PostfixRank ∘ Prefix⁻¹Rank.

The bijection T⁻¹ is used in a variation of this first embodiment mode.

In the same manner, composition-based bijections may be used:

-   InversePostfixRank ∘ Prefix⁻¹Rank, PostfixRank ∘ Inverse    ⁻¹PrefixRank or InversePostfixRank ∘ Inverse ⁻¹PrefixRank in other    embodiment modes of the invention, or yet in variations of them, the    inverses of these bijections.

For the sake of simplicity, the discussion of the present invention isrestricted to using the bijections T and T⁻¹, it being understood thatthe other bijections are equally applicable.

Said bijection T may be computer represented in the form of a secondTable of values in memory, the postfix rank of a node being stored at anaddress representing this node's prefix rank.

Similarly the bijection T⁻¹ may be a computerized data array in the formof a second Table of values stored in memory, the postfix rank of a nodebeing stored at an address representing this node's prefix rank.

An illustration shall elucidate the significance and application ofthese bijections.

FIG. 3A illustrates a tree of which the nodes were indexed by the prefixranks (bold and underlined) and by the postfix ranks (italics). Theprimogeniture order relation illustratively entailed by an alphabeticorder on the tag labels in the case of a lexical tree is shownconventionally increasing from left to right. In this manner each node sis associated with a pair[Prefix(s)Rank, Postfix(s)Rank.

These pairs advantageously are stored in a Table by means of thebijection T (FIG. 3B) or T⁻¹ (FIG. 3C). In FIG. 3B, the postfix rankvalues were stored at the addresses indicated by the correspondingprefix rank values. As regards the inverse, the prefix rank values werestored at the addresses indicated by the corresponding postfix rankvalues.

A first advantage of the tree's computerized data array of the inventionis that it takes up only a memory space the size of the (n+1) tree ascontrasted to a conventional pointer representation (FIGS. 2A and 2B)requiring at least twice that memory space.

A second and essential advantage of this computerized data array is toallow in very simple manner a dependency relation between two treenodes: to determine whether a node s₂ depends on a node s₁.Illustratively no more is needed than to compare Prefix(s₁)Rank withPrefix(s₂)Rank on one hand and Postfix(s₁) to Postfix(s₂)Rank on thother:

-   -   s₂ depends on s₁ if and only if:        Prefix(s₂)Rank>Prefix(s₁)Rank and Postfix(s₂)<Postfix(s₁)Rank.

Accordingly, as regards FIG. 3A, one may see that the node representedby the pair (PrefixRank, PostfixRank)=(5, 1) does depend on thatrepresented by the pair (PrefixRank, PostfixRank)=(1, 5) but not on thatrepresented by the pair (PrefixRank, PostfixRank)=(22, 21).

In the same manner and using Table 3B, the descendants or ancestors of agiven node are easily determined. For instance it is enough—in order todetermine the list of descendants of the node (8, 12)—to analyze theTable in the direction of increasing addresses, starting with theaddress 8 and to search for those among the stored data that are lowerthan the postfix 12 (in this instance 6, 10, 11, 7, 8, 9). These valuesdenote the postfix ranks of the particular node's descendants. Toascertain the list of ancestors of the node (8, 12), it will be enoughto analyze the Table in the direction of the decreasing addresses,starting at the address 8 and to search among stored data for thosewhich are larger than the postfix 12 (here 19, 22). These values denotethe postfix ranks of the particular node's ancestors.

A dual procedure is used in the Table of FIG. 3C. Returning to theprevious illustration, the list of descendants may be determined bymerely analyzing the Table in the direction of decreasing addressesstarting with address 12 and by searching among the stored data forthose which are higher than the prefix 8 (here 14, 10, 13, 12, 11, 9).These values denote the prefix ranks of the particular node'sdescendants. Again, to ascertain the list of ancestors of (8, 12), itsuffices to analyze the Table in the direction of increasing addresses,starting with address 12, and to search among the stored data for thoseless than the prefix 8 (here 7, 0).

A third advantage offered by the computerized data array of theinvention is to allow easily traversing the tree, either from roottoward the leaves—for instance when analyzing a chain of characters(word) using a lexical tree, or from the leaves toward the root, saidtraversal illustratively being carried out when generating a chain ofcharacters from a node's index.

Traversing the tree from the root toward the leaves presumes theknow-how to determine a given node's offspring. As will now be shown,the Table of FIG. 3B (or that of FIG. 3C), allows easily finding saidoffspring.

The Table of FIG. 3B is considered while presuming that the navigationalgorithm searches the offspring of the node (12, 8). Starting from theaddress 8, the Table is analyzed in the direction of increasingaddresses, In the same manner as above, the Table data less than 12 aresearched for. When keeping a data entry x less than 12, the ensuing datawhich are less than x are no longer considered. In other words, theTable will be further analyzed until again a data entry x′ larger than xis found (but still less than the initial value 12). This procedure isrepeated till the end of the Table. Accordingly, in the presentinstance, first the value 6 is encountered, which is retained (<12),then the value 10 which also is retained (6<10<12). The following values7, 8, 9 are not kept because, while being less than 12, on the otherhand they fail to exceed the last retained value 10. Next the value 11is retained (10<11<12), but the following values may not be becausebeing larger than 12.

A dual approach is used in the Table of FIG. 3C. Using the previousembodiment, the Table is analyzed in the direction of decreasingaddresses, beginning at the address 12. In the above manner, the storeddata with a higher prefix than 8 are searched for. When a data entry xlarger than 8 is encountered, the following data that are higher than xshall be ignored. In other words, the Table is analyzed further untilagain a data entry y less than x (but still higher than the initialvalue 8) shall be found. This procedure is iterated until reaching thebeginning of the Table. Accordingly, in the present instance, first thevalue 14 is encountered, which is retained (>8), then the value 10,which is also retained (8<10<14). The following values 11, 12, 13 areignored even though much higher than 8, because not being less than thelast retained value 10. Next the value 9 is retained (8<9<10), howeverthe ensuing values may not be because being less than 8.

This method of determining a given node's offspring is appropriate onlyfor small trees. As shown further below, however, as regards largertrees, the prefix/postfix ranks of said offspring may be calculated morerapidly in a direct manner.

Be it borne in mind that if in lieu of the Tables constructed based onthe bijections T and T⁻¹ they illustratively would have been constructedbased on the bijections InversePostfixRank ∘ Prefix⁻¹Rank, PostfixRank ∘InversePrefix⁻¹Rank or InversePostfixRank ∘ InversePrefix⁻¹Rank or alsowhen based on the inverses of these bijections, the offsprings of agiven node could have been determined in similar manner, though possiblyat the tradeoff of changing the direction of analyzing and/or changingthe direction of the inequalities.

Again the traversal through a tree from its leaves to its rootpresupposes possible determination of a given node's father. It is nowassumed that the navigation algorithm searches for the father of thenode (12, 8); first the Table of FIG. 3B shall be considered. Startingfrom the address 8, the Table is analyzed in the direction of thedecreasing addresses. The first data entry encountered which is higherthan 12 provides the postfix index of the father of the applicable node(here 19).

It is understood that a dual procedure is implemented in FIG. 3C. Inthis instance the Table is analyzed in the direction of increasingaddresses, starting from the address 12. The first data entry beingencountered that is less than 8 provides the prefix index of the fatherof the pertinent node (here 7).

To transform an arbitrary tree into its computerized data array, whichis an operation termed “tree encoding”, first there must be nodeindexing. To encode the tree in the form of a computerized data array ofthe present invention, the nodes must be indexed by means of thefunctions PrefixRank and PostfixRank (or other equivalent functionscited above). Without loss of generality, we shall restrict theexposition of the indexing method of the present invention to the twoabove functions.

The indexing method operates on a conventional computerized tree dataarray using pointers in the manner illustrated by FIG. 2B. Thisconventional data array is attained in known manner beginning with afile of the dictionary inputs. The vertical pointer chaining correspondsto an input relation of the inputs. The horizontal chaining of thesiblings of one node takes place in the primogeniture order such asinherited by classifying the label tags.

The root's prefix rank is initialized to 0 and the tree is thentraversed through from this root along the pointers from elder son untilcoming to leaf (the one most on the left according to the conventionalmode of the selected array of this instance). The postfix rank of thisleaf is initialized to 0.

Assume a node S of the tree having offspring s₀, s₁ . . . s_(p) whichare arranged in increasing primogeniture order (that is, they areordered in horizontal chaining), as illustrated in FIG. 4. The followingrelations ensue:Prefix(s ₂)Rank=Prefix(S)Rank+1Prefix(s _(i+1))Rank=Postfix(s ₁)Rank+Γ(s ₁)Postfix(s _(i+1))Rank=Postfix(s _(l))+Γ(s _(l+1))Postfix(S)Rank=Postfix(s _(p))Rank+1

${V(S)} = {{\sum\limits_{i = 0}^{p}{\Gamma\left( s_{i} \right)}} + 1}$where Γ(S) is the size of the subtree issuing from s.

Indexing by prefix rank and by postfix rank may be carried out in onepass from the tree root by recursively calling a function that, for agiven top s returns the size Γ(s) of the subtree issuing from said tree.This function is listed below in pseudo-code C:

Function size CodingTree (top S, rank Prefix, rank Postfix, TableBijection) Begin SizeOffspringSubTrees = 0: If S is a leaf then {Bijection[Prefix] = Postfix] Return 1; ) Otherwise, for all offspring sof S, Carry out { SubTreesSize = EncodingTree (s, Prefix + 1, Postfix +Size OffspringSubTrees, Bijection); Prefix + = SubTreeSize;OffspringSubTreesSize+ = SubTreeSize; Bijection[Prefix] =Postfix+OffspringSubTreeSize; Return OffspringSubTreesSize+1; End.

Be it noted that in the above program the SubTreeSize variable is thesize of the Subtree issuing from the instantaneous offspring node (s)and that the variable OffspringSubTreesSize is the cumulative value ofthe sizes of the subtrees issuing from the already analyzed offspringnodes.

The TreeEncoding function directly creates a computerized data array inthe form of an index Table of the kind shown in FIG. 3B and stored inmemory. Once this computerized data array has been created, the initialpointer array, now no longer needed, will be eliminated.

Below, and for the sake of simplification, the discussion shall cover adictionary organized as a lexical tree, each dictionary entrycorresponding to one leaf of the tree. As shown above, a computerizeddata array in the form of a Table having the same size as the tree canbe attained by the encoding method of the present invention.

This computerized data array shall be advantageously used to search, onthe basis of a chain of given characters, for the correspondingdictionary input. The index is taken in conventional manner as theleaf's prefix rank (alternatively its inverse prefix rank might also beselected). The index search resorts to a first method of the presentinvention for traversing the tree from its root toward the leaves.

Vice-versa this computerized data array is advantageously used togenerate the chain of corresponding characters based on a dictionaryinput's index. The generation of the chain of characters resorts to asecond method of the present invention to traverse the tree from theroot toward the leaves.

First the case of the index search for a character chain C will beconsidered. The tree is traversed from the root along the arcs of whichthe tags bear the consecutive characters of C.

Advantageously the first method of the present invention for traversingthe tree from the root toward the leaves operates in the followingmanner:

When arriving at the first offspring s₀ of a node S, initialization isas follows:Prefix(s ₀)Rank=Prefix(S)Rank+1and the size Γ(s₂) issuing from s₀ is calculated based onΓ(s ₀)=Postfix(s ₂)Rank−LastPostfixwhere LastPostfix is the postfix rank of the last node of which thesubtree traversal was dropped (in other words, the postfix rank of theroot of the last lopped Subtree during traversal). Illustratively, ifthe node having the postfix rank 5 in FIG. 3A were not retained becausethe arc linking the root and this node does not bear the searched forcharacter, the subtree issuing from this node will not be traversed andLastPostfix=5. Traversal then continues through the node having apostfix rank 19, and, if successful, through that of postfix rank 12.The size of the subtree issuing from this node (first offspring s₂ ofthe node S of postfix rank 19) is effectively 7.

Next the prefix ranks of the consecutive offsprings s_(l) of S aredetermined using the following recursion relations:Prefix(s _(i+1))Rank=Prefix(s _(l))+Γ(s _(l))Γ(s _(l+1))=Postfix(s _(i+1))Rank−Postfix(s _(i))Rank.

For each analyzed offspring s_(l), a test is performed whether thecharacter in progress c of C equals the character exhibited by the tagof the arc joining S to s_(l). If not, analyzing continues with thefollowing offspring s_(l+1) and so on until a character in progress hasbeen found or until all S offsprings have been analyzed.

When arriving at a given node S, it is not known beforehand how many itsoffspring are. To attain this objective, advantageously the size Γ(S′)of the Subtree issuing from said node shall be stored. Next, whenexploring the offspring s_(l) consecutively, the variable

OffspringSubTreesSize shall be updated:

$\sum\limits_{j = 0}^{l}{\Gamma\left( s_{j} \right)}$OffspringSubTreesSize=.

Moreover it will be known that all offspring s_(l) of S shall have beenanalyzed whenOffspringSubTreesSize=Γ(S)−1.

If all offspring were analyzed without the character in progress havingbeen found, the complete character chain C does not correspond to adictionary input (however a portion of C may be included). If thecharacter in progress is found for one of the offspring s_(l), thenS=s_(l), and the search cycle starts over with the next character. Thisprocedure is iterated until a tree leaf has been reached. Thesearched-for procedure is the prefix rank of this leaf. Advantageously,in order to be able to search for words which are prefixes of eachother, for instance “bar” and “barrister” in FIG. 1, one may add, at theend of each mot, a marker of end of word, for instance a spacecharacter. In this case all the leaves of the tree bear end-of-wordmarkers.

In this manner a TreeTraversal function may be defined that based on achain of characters, namely AnalyzeWord, returns the prefix rank of theleaf reached at the end of traversal. This function makes use of thecomputerized tree data array of the invention. Its pseudo-code C isstated below:

Function PrefixIndex TreeTraversal (AnalyzeWord chain , Bijection Table)Begin S = Root; Prefix(S) = 0; // root prefix index SubTree(S)Size =Bijection[Prefix(S)] + 1; LastPostfix = −1; As long as SubTree(S) otherthan 1 { s = eldest son of S; Prefix(s) = Prefix(S) + 1;OffspringSubTreesSize = 0; As long as OffspringSubTreesSize <SubTree(S)Size − 1; If CharacterInProgress corresponds to tag S toward sThen direct traversal toward offspring s by setting S = s; Otherwise {s₂= following S offspring; Postfix(s) =− Bijection[Prefix(s)];SubTree(s)Size = Postfix(s) − LastPostfix; Prefix(s₂) = Prefix(s) +SubTree(s)Size; LastPostfix = Postfix(s); OffspringSubTreesSize + =SubTree(s)Size; analyze the next offspring by setting s−s₂ } ReturnIncompleteTraversalCode } Return Prefix(s);

Vice-versa, if it is desired to generate the chain of characters of thecorresponding dictionary input starting with an index I (assumed equalto the prefix rank of a tree leaf), a second traversal method of thepresent invention shall be used. Said second method differs from thefirst in that the selection of the offspring node henceforth isdetermined by comparing this node's prefix rank to the searched-forindex I. More specifically, the offspring s_(l) is selected as soon asthe relation below has been confirmed:Prefix(s _(i+1))Rank>I

The approach to the index I is by increasing prefix rank values of theanalyzed nodes.

The second method resorts to the same interacting calculation of theprefix ranks of the offspring of a given node S starting from therespective sizes of the subtrees Γ(s₁). The criterion to stop analyzingthe offspring s_(i) of a given node S also is based on comparing ΓS) andthe sum of Γ(s_(i)) of the offspring already analyzed.

One may define a function GenerateWord which, starting form an indexPrefixGuide returns the corresponding chain of characters. This functionalso uses the computerized data array of the tree of the invention. Itspseudo-code in C is shown below:

Function GenerateWord chain (PrefixGuide index, Bijection Table) BeginS= root; Prefix(S)= 0; // root prefix index SubTree(S)size =Bijection[Prefix(S) + 1, LastPostfix = −1; As long as SubTree(S)Size isdifferent from 1 { s = eldest son of S; Prefix(s) = Prefix(S) + 1;OffspringSubTreesSize = 0; As long as OffspringSubtreesSize <Subtrees(S)Size − 1 { s₂ = offspring following S; Prefix(s) =Bijection[Prefix(s)]; Subtree(s)Size = Postfix(s) − LastPostfix;Prefix(s₂) = Prefix(s) + Subtree(s)Size; If PrefixGuide < Prefix(s₂) ;Then { Select the path toward the preceding offspring s by setting S =s; update GeneratedChain } Otherwise { Last Postfix = Postfix(s);OffspringSubtreesSize + = Subtree(s)Size; analyze the followingdescendant by setting s = s₂ } } Return IncompleteTraversalCode } ReturnGeneratedChain;

1. A method comprising: constructing a table representing a directedtree of data entries in a set of data, each data entry of said set beingassociated with a particular node of said tree, wherein constructingcomprises: assigning, in a processor arrangement, a first index to eachnode of said tree, the first index representing the node rank accordingto a first bijective order relation ordering all the nodes of the treeaccording to a combination of (a) a descending order relation ordering anode relative to its descendants and (b) a primogeniture order relationof the nodes which are the offspring of one of said nodes; assigning, inthe processor arrangement, a second index to each node of said tree, thesecond index representing a node rank according to a second bijectiveorder relation ordering all the nodes of the tree according to acombination of (a) the inverse order relation of said descending orderrelation and (b) said primogeniture order relation, wherein a given nodes2 of the directed tree depends on another node s1 of the directed treeif and only if the node rank of s2 according to the first order relationis greater than the node rank of s1 according to the first orderrelation and the node rank of s2 according to the second order relationis less than the node rank of s1 according to the second order relationand; and storing, in the table with the processor arrangement, valuesrepresenting the first index of nodes in the tree at addressesrepresenting the second index of nodes in the tree.
 2. The method ofclaim 1, further including determining, in the processor arrangement,the size of a node subtree of an arbitrary tree node by performing arecursive calculation.
 3. The method of claim 2, wherein the treeincludes a parent node from which first offspring and second offspringdescend, said first and second offspring being adjacent in a list ofoffspring ordered according to the primogeniture order and stored in theprocessor arrangement, the recursive calculation determining (a) thefirst index of the second offspring starting from the first index of thefirst offspring and the size of the subtree derived from the firstoffspring, and (b) the second index of the second offspring startingfrom the second index of the first offspring and the size of the subtreederived from the second offspring.
 4. The method of claim 3, wherein therecursive calculation stage determines the first index of the offspringclassified as being first in a list starting from the first index ofsaid parent node and the second index of said parent node starting fromthe second index of the offspring classified as being last in said list.5. The method of claim 2, further including determining, in theprocessor arrangement, the size of the subtree derived from said parentnode starting from the sum of the sizes of the subtrees derived from theoffspring of the parent by performing said recursive calculation.
 6. Themethod of claim 1, further including causing the processor arrangementto use pointers on a first tree data array so that for a given node, theprocessor arrangement derives (a) a first kind of pointer that in turnderives a first offspring node of the given node according to thedescending order relation and (b) a second kind of pointer that in turnderives the list of a second offspring node of the given node.